Comparing between Arabic Text Clustering using K Means and K Mediods

نویسنده

  • Mahmud S. Alkoffash
چکیده

In this study we have implemented the Kmeans and Kmediods algorithms in order to make a practical comparison between them. The system was tested using a manual set of clusters that consists from 242 predefined clustering documents. The results showed a good indication about using them especially for Kmediods. The average precision and recall for Kmeans compared with Kmediods are 0.56, 0.52, 0.69 and 0.60 respectively. we have also extract feature set of keywords in order to improve the performance, the result illustrates that two algorithms can be applied to Arabic text, a sufficient number of examples for each category, the selection of the feature space, the training data set used and the value of K can enormously affect the accuracy of clustering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing k-means clusters on parallel Persian-English corpus

This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...

متن کامل

Comparing Model-based Versus K-means Clustering for the Planar Shapes

‎In some fields‎, ‎there is an interest in distinguishing different geometrical objects from each other‎. ‎A field of research that studies the objects from a statistical point of view‎, ‎provided they are‎ ‎invariant under translation‎, ‎rotation and scaling effects‎, ‎is known as the statistical shape analysis‎. ‎Having some objects that are registered using key points on the outline...

متن کامل

D Partition Algorithms – A Study and Emergence of Mining Projected Clusters in High - Dimensional Dataset

High-dimensional data has a major challenge due to the inherent sparsity of the points. Existing clustering algorithms are inefficient to the required similarity measure is computed between data points in the full-dimensional space. In this work, a number of projected clustering algorithms have been analyzed. However, most of them encounter difficulties when clusters hide in subspaces with very...

متن کامل

A Clustering Based Location-allocation Problem Considering Transportation Costs and Statistical Properties (RESEARCH NOTE)

Cluster analysis is a useful technique in multivariate statistical analysis. Different types of hierarchical cluster analysis and K-means have been used for data analysis in previous studies. However, the K-means algorithm can be improved using some metaheuristics algorithms. In this study, we propose simulated annealing based algorithm for K-means in the clustering analysis which we refer it a...

متن کامل

A Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS

Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012